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Abstract 

We study the loss in objective value when an inaccurate objective is optimized instead of the 
true one, and show that "on average" this loss is very small, for an arbitrary compact feasible 
region. 

1 Introduction 

This paper is concerned with the loss in objective value incurred when an inaccurate objective, because 
of either uncertainty or misspecification, is optimized instead of the true one. 

Consider the following model case. Instead of the true objective w^x, the nominal objective v^x is 
maximized over the unit ball in R", where w and v are unit vectors making an angle a. (Throughout 
the paper, we assume < a < 7r/2.) Then the computed optimal solution \s x = attaining a 
true objective value of cos a. Since the true optimal value is 1, the loss is 1 — cos a, but to make the 
measure scale-invariant, we divide by the range of the true objective over the feasible region, which is 

2 (from —1 to +1). The scaled loss is thus (1 — cosa)/2: see Figured! Our main result claims that 
this formula for the scaled loss holds "on average" for any compact feasible region. Since this result on 
the robustness of the optimal value to misspecification of the objective holds for any feasible region, 
we call it a robust robust optimization result. 

Robust optimization has been much studied over the last fifteen years: see, e.g., [21 [3l [H O [6] . 
Usually there is uncertainty in the constraints as well as the objective, and the goal is to find a decision 
vector that is feasible regardless of the realization of the constraints and that achieves a guaranteed 
performance regardless of the realization of the objective. Typically this leads to an optimization 
problem that is harder than the deterministic version of the problem. Our concerns are appropriate 
when the decision maker is oblivious to the error in the objective and does not protect against a 
possible misspecification. 

In Section 2 we define our setting and give a worst-case bound on the scaled loss for a class of 
feasible regions. Section 3 describes two probability distributions for the true and nominal objectives 
and obtains our probabilistic result; we also explain why "on average" is in quotes above. Finally, in 
Section 4 we discuss the result and outline two applications. 
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Figure 1: Scaled loss: model case. The loss is the red segment; the range the sum of the red and green 
segments. 

2 Definitions and Worst-Case Results 

Let C C R*^ be compact and nonempty. Since we are interested in maximizing linear functions over C, 
we could assume without loss of generality that C is convex, by replacing it if necessary by its convex 
hull. However, in Section 4 we treat optimization over a nonlinear transformation of a compact set 
and over a set of binary vectors, and so we prefer not to restrict C further. 

Definition 1. For v G R", we define 

max(t') := ma,x{v'^x : x S C}, 
min(f) := mm{v^x : x G C}, and 
range(t') := max(t>) — min(t>). 

Now consider two objectives, the true objective w'^x and the nominal objective v'^x. If we maximize 
v'^x over C, the optimal solution set is {x G C : v'^x = max(u)}, and if this is not a singleton, we 
might be unlucky and choose the worst x as far as the true objective is concerned. Hence we make 

Definition 2. For v,w £ R", define 

loss{v,w) := max{w) — mm{w'^ x : X € C , v'^ x = m.ax{v)} and 

scaledJossfw, := ^— — }-. 

range(it;) 

Note that the scaled loss is invariant to translations or dilations of C, and even to rotations if v 
and w are correspondingly rotated. 

As we have seen, if C = B"" := {x G R" : ||x|| < 1} (all norms are Euclidean), v = (0; 1; 0; . . . ; 0), 
and w = (sin a; cos a; 0; . . . ; 0), then loss(v, w) = 1 — cos a and scaled_loss(f , w) = {1 — cos a)/2. On 
the other hand, if C is the convex hull of (—1; 0; . . . ; 0) and (+1; 0; . . . ; 0) and v and w are as above, 
then loss(w,tt;) = range(?i;) = 2 sin a and the scaled loss is 1, as bad as it can be. 
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Note that in this example, the optimal solution set for v^x is all of C, and in accordance with the 
definition above, we choose the worst of these optimal solutions with respect to the true objective, 
namely (—1; 0; . . . ; 0), in evaluating the loss. 

Also, for n = 2, we will usually view the nominal objective v as pointing vertically up as in this 
example. Observe that there is a subtle bias in this viewpoint. While the decision maker only sees 
V, and therefore thinks of v as fixed and w (if she thinks of it at all) as a perturbation of u , a more 
appropriate perspective would regard the true objective w as being generated in some suitable way, 
and then v arising as a perturbation of w. 

In the rest of this section, we obtain a worst-case bound on the scaled loss when C is restricted to 
avoid the situation above. 

Theorem 1. Assume that C is contained in and contains rB'^ for some < r < 1. Let v and 
w be two nonzero vectors making an angle a, where s'ma < r < cos a. Then, with p := \/l — r'^, we 
have 

scaledJossft', tt;) < — r , 

r(l + cos a) + psma 

and this bound is tight. 

Proof. First we show that the right-hand side above can be attained. Let n = 2 and choose C to be 
the convex hull of {—p;r), {p;r), and rB^. Let v = (0; 1) and w = (sin a; cos a). Then the set of 
optimal solutions for v'^x is the convex hull of (— p; r) and (p; r), with the former being worst for w'^x. 
With our assumption that r < cos a, the optimal solution for w'^x is {p',r), while r > sin a implies 
that w'^x is minimized at —rw. Hence the loss is 2/9 sin a and the range r(l + cos a) + psina, giving 
the scaled loss as indicated. See Figure [21 

Now we need to prove the bound. Given any C, v, and w, we can project C into the plane 
spanned by v and vu, and the projected C will lie in B^ and contain rB^. Hence we can assume that 
n = 2. By rotating if necessary, we can assume that v and w are as above (note that the scale of 
these vectors is immaterial). Let s := maxj^-^x : x G C} > r > sin a, and let a := Vl — s^- Then 
mm{u)^x : x € C,v'^x = inax{v)} > —cr sin a + scosq > and min(w) < — r, and so 

, , , , , max(ti;) + a sin a — s cos a 

scaled_loss(u, < — ; 

max(iz;) + r 

note that the right-hand side is monotonically increasing in max(ty), so substituting an upper bound 
for the latter provides a valid upper bound on the scaled loss. 
If s < cos Q, then iiiax{w) < o sin a + s cos a, and we deduce 

, , , , . 2(7 sin a 2psina 
scaled_loss(v, It;) < — \ < 



a sin a + s cos a + r p sin a + r cos a + r 
as desired. 

On the other hand, if s > cos a so that a < sin a, then max(?i;) < 1 and so 

1 + (7sinQ — scosa 1 + sin^ (3; — cos^ 2sin^a 2psinQ 
scaled_loss(t>, w) < < = — < 



1 + r 1 + r 1 + r r cos a + p sin a + r 

since sin a < p and 1 > r cosa + psina. Hence the bound is established in either case. 
□ 

Since p < cosa, the right-hand side above is at most sin a/r, and it approaches this value for small 
r and very small a. This bound is of order a/r, and hence much larger than (1 — cos(a;)/2 ~ 
the value in the model case. 
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Figure 2: Scaled loss: worst case. The feasible region is the convex hull of the horizontal line segment 
and the smaller circle. Again, the loss is the red segment; the range the sum of the red and green 
segments. 

3 Probabilistic Analysis 

Now we examine how the scaled loss behaves when v and w are generated randomly. We examine two 
different probability distributions. We call v S a standard Gaussian vector if its components are 
independent standard Gaussian random variables, or equivalently if t> ~ A^(0,/). 

Definition 3. We say (w, v) is generated according to Probability Distribution 1 if w and u are 
independent standard Gaussian vectors in R", and v := wcosa + usina. Expectations with respect to 
this distribution are indicated by Ei . 

We say a random variable ^ depending on n concentrates around a positive constant (3 if, for every 
positive (5, the probability that ^ lies between (1 — 5)j3 and (1 + 5)j3 converges to 1 as n — )■ oo. 

Proposition 1. The angle between v and w generated according to Probability Distribution 1 concen- 
trates around a as n ^ oo. 

Proof. Because w and u are standard Gaussian random vectors, so is v, and both w'^w and v'^v are 
chi-squared random variables with n degrees of freedom, both concentrated around their means, n. 
Also, v'^w = cos a w'^ w + sin a u'^ w , and since w'^vu is concentrated around n and u'^w has mean zero 
and variance 0{n), this is concentrated around cos a n. Hence, using a union bound, we find that with 
probability approaching 1, v'^w/{v'^v w'^w)^^'^ lies between (1 — e) cos a/(l + e) and (1 + e) cos a/(l — e) 
for any positive e. This implies the result. □ 

We now define our second model: 

Definition 4. We say {w, v) is generated by Probability Distribution 2 if w and u are independent 
standard Gaussian vectors in R", u = {I — wuF /'w'^w)u, w = w/\\u]\\, u = n/||u||, and v = wcosa + 
tisina. Expectations with respect to this distribution are indicated by E2. 
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Note that all the vectors are well-defined with probability one, with w and u orthogonal vectors 
having unit norm, so that w and v are unit vectors making an angle a with probability one. 

In both distributions, w is generated according to some distribution, and v is generated as a 
perturbation of w. This fits with our interpretation of w as the true objective and w as a nearby 
inaccurate objective. However, as we now show, we can alternatively regard v as being generated first 
and then w as a. perturbation of v. This is very useful in our analysis. 

Proposition 2. In both Probability Distribution 1 and Probability Distribution 2, {v,w) ~ {w,v). 

Proof. Consider first Probability Distribution 1. Since the matrix 

cos a sin a 
sin a — cos a 

is orthogonal, {v,z) := (it; cos a + u sin a, sin a — u cos a) ~ {w,u). Then, since {v.,w) is derived from 
the first pair exactly as {w.,v) is derived from the second, we see that {v,w) ~ {w,v). 

Next assume (w, v) is generated according to Probability Distribution 2. Then w lies on the unit 
(n — l)-dimensional sphere S^~^ := {x S R" : ||x|| = 1}, and v = w cos a + usin a, where u lies on the 
unit (n — 2)-dimensional sphere S*^"^ := {x € R" : w'^x = 0, ||x|| = 1}. 

Let Q be any orthogonal matrix. Since {QiB,Qu) ~ iw,u), and Qu;/||Q?D|| = Qw, we see that 
the distribution of Qw coincides with that of w, so that w is uniformly distributed on S^^^. Also, 
(I — QwuF Q'^ / {Qw)'^ {Qw))Qu = Q{I — wuF /uFw)u = Qu, so that {QiD,Qu) gives rise to Qw and 
Qu and hence Qv = (Qw) cos a + (Qu) sin a). It follows that v is also distributed uniformly on 5"~^. 

Moreover, even if Q is taken to be an orthogonal matrix that fixes w, so that Q depends on w, we 
still have Qw = w a standard Gaussian vector in R" and Qu a standard Gaussian vector independent 
of w. Hence, proceeding as above, we see that {w = Qw, Qu) gives rise to Qu and hence Qu, so since 

Qu) ~ {w, u), we find Qu ~ u, from which, conditioned on w, u is uniformly distributed on S*^"^. 

Finally, we show that {v, z) := {w cos a + u sin a, w sin a — u cos a) ~ {w, u). Since w = v cos a + 
zsina, this will show that {v,w) ~ {w,v) as desired. We have already shown that v, the first member 
of the pair, is distributed uniformly on S^~^. We now consider all pairs {w, u) that give rise to a given 
V. If Q is an orthogonal matrix that fixes v, then {Qw,Qu) also gives rise to the same v. Since Q is 
orthogonal, the distribution of {w,u), conditional on this fixed v, is invariant under pre-multiplication 
of each vector by such a Q. As we have seen, under this transformation w is transformed to Qw and 
u to Qu, and hence z := w sin a — u cos a is transformed to Qz. It follows that z, which has unit norm 
and is orthogonal to w, is uniformly distributed on 5*^"^. This concludes the proof. □ 

We are now ready to analyze the behavior of the scaled loss "on average" for our two models. First 
we investigate the range function: 

Lemma 1. For i = 1,2, we have 

Ei max(v) = Ei max(u'), Ei range(Ti;) = 2Ei max(ii;). 

Proof. The first equation follows from Proposition [2] above, since v and w have the same distribution. 
For the second equation, note that 

Ei range(ti;) = Ei vaayi{w) — Ei min(w) = Ei max(ii;) + Ei max(— id), 

and that Ei max(— tu) = Ei max(w) since under both probabilistic models, w has a symmetric distri- 
bution. □ 
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Next we examine the loss: 

Lemma 2. For i = 1,2, we have 

Ei loss(f , w) = {1 — cos a)Ei max(ri;). 

Proof. First note that, since C is compact, the convex function max(f) is finite everywhere, and 
hence is differentiable almost everywhere, with respect to Lebesgue measure and hence with respect 
to Probability Distribution f. But max(u) is differentiable at v exactly when the maximum of x 
over C is attained at a single x, which we denote by x^. Since this property is invariant under positive 
scalings of we see that it holds also for almost all v under Probability Distribution 2 also. 
Hence with probability one, 

loss(T;,t(7) = max(t(;) — uF = max(tt;) — {vcosa + zsina)^ x^, 

where, as in the proof of Proposition [2l we let z := wsma — ucosa. Now in either model, z has a 
symmetric distribution conditional on v, and since with probability one Xy depends only on v, z'^x^ 
has mean zero. Hence 

Ei loss(f , w) = Ei max{w) — cos a Ei max(f ) — 0, 
and the result follows from Lemma [H □ 

From Lemmas [1] and [21 we immediately deduce 
Theorem 2. Fori = 1,2, we have 

Eiloss{v,w) 1 — cosa 
Ei range(tt;) 2 

□ 

Note that we do not have a result on the expected scaled loss, which would be an expectation of 
the ratio of the loss to the range, but only on the ratio of the expectations, which is why we have put 
"on average" in quotes above. 

4 Discussion and applications 

It seems at first that the theorem of the last section would hold under much weaker probabilistic 
assumptions, merely requiring that w and u have symmetric distributions. Unfortunately, there are 
two problems with this. First, what we really need is that v and z have symmetric distributions, 
but putting restrictions on v and z conflicts with the natural interpretation that the true objective vj 
should be generated first, and then v as a perturbation of w. Second, it is crucial that max{v) and 
max{w) have the same expectation, and this appears hard to ensure under weaker assumptions: the 
fact that {v, w) and {w, v) have the same distribution under our two models is key in our development. 

One way in which the result can be generalized is in allowing a random choice of a. Our two 
models yield vectors v and w making an angle that either concentrates around a or is exactly a. 
Instead, we can consider probability distributions on the triple {a, w, v) as follows: first a is generated 
according to an arbitrary distribution supported on (0, 7r/2); then, conditional on a, w and v are 
generated according to Probability Distribution 1 or 2. It is easy to see that all our arguments can 
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be extended by first conditioning on a, and the expected loss divided by the expected range wih be 
(1 — E{cosa))/2, where the expectation is taken with respect to the distribution on a. 

Another generalization allows very general distributions, but changes the way the objective vector 
is perturbed. Let fj be a symmetric probability density on R for j = 1, . . . ,n. For each j, draw wj 
and Uj independently from fj, and then let Vj be Wj with probability cos a and Uj with probability 
1 — cos a, with all the choices independent. Let t S R" be defined by tj = +1 if Vj = Wj, tj = — 1 
if Vj = Uj, so each tj is +1 with probability cos a and —1 with probability 1 — cosq. Then it is 
clear how {w,v) arises as a function of the triple {w,u,t). But if we define z := w + u — v, then in 
each component, z agrees with w (u) exactly when v agrees with u (w). It follows that {v,z,t) has 
the same distribution as {w,u,t), and {v,w) arises from {v,z,t) as does {w,v) from {w,u,t). Hence 
(f , w) and {w, v) have the same distribution. Moreover, the arguments of the previous section can be 
duplicated, and again lead to the result that the expected loss divided by the expected range is exactly 
(1 — cosa)/2. Under mild conditions on the /j's, the angle between v and w concentrates around a. 
Note that, in this model, a small fraction of the components are changed a possibly large amount, 
while in the previous models, each component is changed a small amount. 

We argued in the introduction that the scaled loss provided a good measure of how much is lost 
in objective value when implementing the optimal solution for a misspecified objective. However, the 
result of the previous section is concerned with the ratio of the expectations of the loss and the range, 
rather than the more meaningful expectation of the ratio. We therefore conducted some experiments 
with two NETLIB [9] problems, AGG and BOEINGl, to see how much the results differ. The first 
has 489 rows and 163 columns, the second 351 rows and 384 columns. Figures [3] and HJ which give 
graphs of the expectation of the ratio and of the ratio of the expectations as functions of the angle 
a in degrees, show that our results should be applicable to the more meaningful measure also. In 
both figures, each data point is obtained using at least 10,000 pairs {v, w) generated from Probability 
Distribution 1. 
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Figure 3: Ratio of expectations versus expectation Figure 4: Ratio of expectations versus expectation 
of ratios for AGG. of ratios for BOEINGl. 

Our result is limited to clarifying what happens when linear objective functions are perturbed. 
In general, little can be said for nonlinear functions, partially because it is not clear how random 
nonlinear objective functions and their perturbations should be defined. However, our analysis can 
be applied to one case where objective functions are nonlinear. Suppose there are several continuous 
objective functions, fi{x) for i = 1,. . . ,k. The decision maker is interested in high values of all of 
these objectives, so we are in the realm of multi-criteria optimization; see, for instance, [3 [8]. Often, 
a linear combination Wifi{x) of the objectives is maximized, and indeed, any such optimal solution 
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for a positive w is an efficient or Pareto-optimal solution. (The converse is true when all /j's are 
concave, and in this case the optimization problem is convex, but this restriction is not needed for our 
discussion.) However, since the different function /j may be hard to compare, it is difficult to decide 
on appropriate weights w. Our theorem indicates in some sense that the choice may not matter too 
much. 

Let Y := {y = (/i(x); . . . , fk{x)) : x £ X}. Since X is nonempty and compact, so is Y, and trivially 
max{^- Wifi{x) : x G X} = max{w'^y : y G Y}. Theorem [2] shows that the latter is insensitive in some 
precise sense to the specification of the objective w, and this translates directly into the insensitivity 
of the original problem's optimal value to the specification of the weights. Of course, there is a large 
caveat here: the result requires v and w to be randomly chosen from symmetric distributions, and 
hence their components are as likely to be negative as positive, while in the multi-criteria setting, the 
weights are always positive. Nevertheless, we believe our theorem gives some credence to the hope 
that incorrect choices of weights should not hurt much. 

Our second application is to the complexity of combinatorial optimization problems. Beier and 
Vocking [Ij and Roglin and Teng [10] have uncovered a fascinating connection between binary opti- 
mization problems that can be solved in randomized pseudo-polynomial time, that is, in randomized 
polynomial time if the data are encoded in unary, and smoothed complexity. In particular, Roglin and 
Teng show that such a problem can be solved in expected time polynomial in the input size and 1/cr, 
where the adversarily chosen objective function coefficients are perturbed by independent Gaussians 
with mean zero and variance cj^ (there are slight technical subleties; see Sections 2 and 6 of [lOj). 
This is normally interpreted as saying that arbitrarily close to any potentially hard instance there 
are polynomially solvable instances. This gives support to the belief that one would be unlucky to 
choose a bad instance, in a rather precise and strong sense. Our result provides another avenue to 
solving such problems. One can explicitly make a small random perturbation of the objective function 
coefficients, thereby obtaining a problem with provably expected polynomial-time complexity. (Note 
that w + z, where z has independent zero-mean Gaussian components with standard deviation a, is 
proportional to w cosa + usina, where u is a Gaussian random vector and a := arctanu.) Solving this 
perturbed problem gives a feasible solution to the original problem, and Theorem [2] gives credence 
to the hope that this solution will be close to optimal for the true objective function. (For some 
models of generating w, renormalizing might give a distribution on a, rather than a fixed value, but 
the extension mentioned at the beginning of this section allows for this possibility.) Of course, our 
result only proves this "on average," so one would be unlucky to have an objective function where the 
loss is large, in a certain sense. We believe this viewpoint provides further insight into the notion of 
smoothed complexity, at least when only the objective function is perturbed. 

References 

[1] R. Beier and B. Vocking, Random knapsack in expected polynomial time, Journal of Computer 
and System Sciences 69 (2004), pp. 306-329. 

[2] A. Ben-Tal, L. El Ghaoui, and A. Nemirovski, eds.. Robust Optimization, Mathematical Program- 
ming 107, Numbers 1-2, 2006. 

[3] A. Ben-Tal, L. El Ghaoui, and A. Nemirovski, Robust Optimization. Princeton University Press, 
Princeton, NJ, 2009. 



8 



[4] A. Ben-Tal and A. Nemirovski, Robust solutions of uncertain linear programs, Operations Re- 
search Letters 25 (1999), pp. 1-13. 

[5] A. Ben-Tal and A. Nemirovski, Robust solution of Linear Programming problems contaminated 
with uncertain data, Mathematical Programming 88 (2000), pp. 411-424. 

[6] D. Bertsimas and M. Sim, Tractable approximations to robust conic optimization problems. Math- 
ematical Programming 107 (2006), pp. 5-36. 

[7] I. Das and J.E. Dennis, A closer look at drawbacks of minimizing weighted sums of objectives for 
Pareto set generation in multicriteria optimization problems. Structural Optimization 14 (1997), 
pp. 63-69. 

[8] C. Hillermeier, Nonlinear Multiobjective Optimization: A Generalized Homotopy Approach. 
Birkhauser, Basel, 2001. 

[9] NETLIB linear programming problems, available from Ihttp : //www . netlib . org/lp/data| 

[10] H. Roglin and S.-H. Teng, Smoothed analysis of multiobjective optimization, in: Proceedings of 
the 50th Annual IEEE Symposium on Foundations of Computer Science, 2009, pp. 681-690. 



9 



